WIP: Remove ldap caching and move to the Zitadel v2 API #79

tlater-famedly · 2024-10-30T11:53:35Z

Closes: #24
Closes: #53
Closes: #30

WIP because there are some things to clear up still. The tests will all fail here, but with some fudging of the sources almost all tests pass.

There are a few things that need to be discussed/covered:

Multi-source config/test suite

Previously, we wanted to support specifying multiple sources simultaneously, since that came somewhat for free. With the removal of the cache, however, only the UKT source uses email addresses to identify users, and in fact only it has a concept of "removed" users, which means that it fundamentally needs to be handled separately from the other sources. Trying to support multi-source definitions is much more tricky with this limitation, so I believe we should not do so.

This is the cause of most of the remaining failures; the test suite was written around the assumption that we can just dump all kinds of sources into one big config file, and for getting this over the line I've ignored also rewriting the test suite so far, instead testing by only having the respective source actively usable in code for the time being.

External user ID encoding

Since we universally use the external user IDs to uniquely identify users now, and we actually pull data from Zitadel instead of just pushing, we now need to re-ingest the user IDs we've written to the nickname field. Unfortunately, because of ldap3's odd behavior of silently converting byte-typed values to strings when they can be, but otherwise returning bytes, we need to base64-encode these IDs.

So far we've just encoded any values that were bytes, and left strings alone - alas, apparently you can end up with strings randomly being valid base64 values (e.g. starttls). This means that we do not have a way to distinguish between IDs that were encoded and IDs that were not - when we get unlucky, this can make it impossible to uniquely identify a user, since decoding stringly-typed IDs can sometimes result in valid values.

Fixing this in the current implementation is easy (simply prefix encoded strings with a prefix, or encode all values, since UTF-8 strings are also valid byte strings), but backwards compatibility is a problem, since users with encoded IDs already exist in production.

It's possible that there is no way around runs in which some users will be deleted and immediately re-created.

Zitadel version

Due to the various recent Zitadel bugs, we need to update the Zitadel version in this PR. I've currently gone with the latest (2.64.1), but maybe we want another version?

Remaining tasks:

emgrav · 2024-10-30T11:58:40Z

It's possible that there is no way around runs in which some users will be deleted and immediately re-created.

For UKE we'll need to make sure to test this with a dry run

src/lib.rs

tlater-famedly · 2024-10-30T12:08:31Z

src/zitadel.rs

+			user.set_idp_links(vec![IdpLink::new()
+				.with_user_id(imported_user.external_user_id.to_string())
+				.with_idp_id(self.zitadel_config.idp_id.clone())
+				// TODO: Figure out if this is the correct value; empty is not permitted


This should probably be confirmed to work before we merge; we have no way to test if IDP logins actually work.

Since I ended up reading the source code to figure out how Zitadel handles IDP IDs, I also figured out what the "name" means. I originally just wrote this comment:

This refers to the "username" that the search filter defined in the IDP config resolves to. This means that for our IDP links to work, that needs to be configured to resolve to the user's email address. See: https://github.com/zitadel/zitadel/blob/250f2344c8c2292ca9b861cdd12223d0b4719d43/internal/idp/providers/ldap/session.go#L230

That should make it into the readme, though.

tlater-famedly · 2024-10-30T13:44:43Z

It's possible that there is no way around runs in which some users will be deleted and immediately re-created.

For UKE we'll need to make sure to test this with a dry run

The UKE source should be unaffected, because it doesn't need to parse external IDs from Zitadel. But yeah, probably prudent to do dry runs for the first sync after updating in general, this is a pretty big change.

src/lib.rs

emgrav · 2024-10-31T14:08:02Z

Remaining before this can be merged:

Some tests need to be refactored to no longer combine multiple sources.
We need to document the potential quirks of external user ID encoding, including instructions for infra on what to look for in a dry run to identify a problem before it happens.

jannden · 2024-11-03T09:04:22Z

There is a clash between CSV and LDAP at this point. When we import CSV, it wrongly deletes Zitadel users not present in the CSV (or rather users that it couldn't match).
A) Is that a desired behavior?
B) The old and new users are compared with their external_user_id which won't match if we mix CSV and LDAP, because they use different external_user_id. CSV uses email addresses for that and LDAP uses user_id attribute pulled from LDAP for that. Shall we add another field in CSV to explicitly add a user_id that has the potential to match LDAP user_id?

tlater-famedly · 2024-11-04T10:14:15Z

There is a clash between CSV and LDAP at this point. When we import CSV, it wrongly deletes Zitadel users not present in the CSV (or rather users that it couldn't match).

A) Is that a desired behavior?

Ideal behavior would of course be that switching between LDAP/CSV is possible, but this is not a use case we actually currently have, so while not ideal, it's probbaly fine for this to happen.

B) The old and new users are compared with their external_user_id which won't match if we mix CSV and LDAP, because they use different external_user_id. CSV uses email addresses for that and LDAP uses user_id attribute pulled from LDAP for that. Shall we add another field in CSV to explicitly add a user_id that has the potential to match LDAP user_id?

Yes, this is already a planned feature, see #75. We should do that as a separate task, though, as mixing the sources will for now be unsupported.

We should probably make sure with product that this is indeed fine, however.

jannden · 2024-11-04T10:40:49Z

Okay, in that case, we can merge the updated tests from: #80

codecov · 2024-11-04T11:20:29Z

Codecov Report

Attention: Patch coverage is 92.01774% with 36 lines in your changes missing coverage. Please review.

Project coverage is 91.38%. Comparing base (483af5f) to head (7b1b501).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/lib.rs	87.20%	16 Missing ⚠️
src/zitadel.rs	91.57%	15 Missing ⚠️
src/sources/csv.rs	75.00%	4 Missing ⚠️
src/main.rs	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #79      +/-   ##
==========================================
+ Coverage   86.76%   91.38%   +4.62%     
==========================================
  Files           7        8       +1     
  Lines        1352     1277      -75     
==========================================
- Hits         1173     1167       -6     
+ Misses        179      110      -69

Files with missing lines	Coverage Δ
src/config.rs	`97.76% <100.00%> (+1.45%)`	⬆️
src/sources/ldap.rs	`97.85% <100.00%> (+2.16%)`	⬆️
src/sources/ukt.rs	`86.20% <ø> (-0.53%)`	⬇️
src/user.rs	`100.00% <100.00%> (+10.29%)`	⬆️
src/main.rs	`0.00% <0.00%> (ø)`
src/sources/csv.rs	`93.28% <75.00%> (-0.10%)`	⬇️
src/zitadel.rs	`93.05% <91.57%> (+17.59%)`	⬆️
src/lib.rs	`87.20% <87.20%> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 483af5f...7b1b501. Read the comment docs.

emgrav · 2024-11-05T09:17:45Z

When we merge this we should tag a new version and notify Niklas

src/user.rs

tests/e2e.rs

This allows using the debug impls of our User structs without worrying too much about accidentally exposing PII. Note that this means `external_user_id` should *never* contain PII, and as such we'll have to change the CSV source. An issue about this will be opened separately.

tlater-famedly · 2024-11-14T13:34:16Z

src/user.rs

+	/// See
+	/// https://www.notion.so/famedly/Famedly-UUID-Specification-adc576f0f2d449bba2f6f13b2611738f
+	pub fn famedly_uuid(&self) -> String {
+		Uuid::new_v5(&FAMEDLY_NAMESPACE, self.external_user_id.as_bytes()).to_string()


This is no longer correct. We need to encode the raw byte value in the UUID, not the hex-encoded string.

We should write tests that explicitly confirm this, so we don't accidentally change it again.

tlater-famedly · 2024-11-14T13:34:59Z

src/zitadel.rs

-		};
+		if self.feature_flags.is_enabled(FeatureFlag::SsoLogin) {
+			user.set_idp_links(vec![IdpLink::new()
+				.with_user_id(imported_user.external_user_id.clone())


We need to use the broken base64-encoded scheme here for Zitadel to interpret the IDs "correctly".

We should probably also send an issue upstream about this?

Need a test to confirm this is correctly encoding IDs too.

tlater-famedly · 2024-11-14T14:08:24Z

tests/e2e.rs


 	let user = zitadel.get_user_by_login_name("delete_me@famedly.de").await;
 	assert!(user.is_err_and(|error| matches!(error, ZitadelError::TonicResponseError(status) if status.code() == TonicErrorCode::NotFound)));
 }

+// Currently fails because CSV uses non-hex encoded IDs, need to think
+// about how to fit this into the overall workflow
+#[ignore]


Not working yet

I suppose the CSV backend should simply also hex-encode any IDs, and we simply make this an assumption of all sources so that we can rely on this being correct all the time?

Yes, all IDs should go through the encoding

tlater-famedly · 2024-11-14T14:20:21Z

tests/e2e.rs

@@ -1,17 +1,17 @@
 #![cfg(test)]


We should add one more test, to assert that SSO linking actually works. I believe Zitadel is set up to allow this already, we just need to add a test.

I have added a new test test_e2e_sso_linking. Is that what you meant? Feel free to tweak it.

tlater-famedly · 2024-11-14T14:25:15Z

tests/e2e.rs

@@ -1,17 +1,17 @@
 #![cfg(test)]
+#![allow(clippy::expect_fun_call)]


Should add this to the clippy config, and add it to our company-wide clippy config, if we don't like that lint.

I suggest to create an issue out of this and leave it explicitly in the test file for this PR.

jannden

A couple of things I noticed, meaning as points of discussion.

src/main.rs

src/sources/csv.rs

src/user.rs

jannden

Good for manual tests now.

tlater-famedly requested a review from a team as a code owner October 30, 2024 11:53

tlater-famedly commented Oct 30, 2024

View reviewed changes

src/lib.rs Show resolved Hide resolved

tlater-famedly commented Oct 30, 2024

View reviewed changes

emgrav previously approved these changes Oct 31, 2024

View reviewed changes

src/lib.rs Show resolved Hide resolved

tlater-famedly dismissed emgrav’s stale review via 20a194f October 31, 2024 13:39

tlater-famedly force-pushed the tlater/remove-cache branch from 20a194f to 35148f4 Compare October 31, 2024 13:41

This was referenced Nov 4, 2024

test: Support single-source sync #80

Merged

Support single-source sync in Tests #81

Closed

emgrav previously approved these changes Nov 4, 2024

View reviewed changes

tlater-famedly dismissed emgrav’s stale review via be1346e November 4, 2024 14:36

tlater-famedly force-pushed the tlater/remove-cache branch 6 times, most recently from edd027e to 6f6db25 Compare November 4, 2024 15:46

jannden previously approved these changes Nov 4, 2024

View reviewed changes

lukaslihotzki-f mentioned this pull request Nov 5, 2024

release: v0.6.0 #85

Merged

tlater-famedly commented Nov 5, 2024

View reviewed changes

src/user.rs Outdated Show resolved Hide resolved

tlater-famedly dismissed jannden’s stale review via da89731 November 5, 2024 23:50

tlater-famedly force-pushed the tlater/remove-cache branch from 6f6db25 to da89731 Compare November 5, 2024 23:50

jannden mentioned this pull request Nov 8, 2024

Famedly-Sync: Refactor for ID encoding and sorting #91

Closed

jannden mentioned this pull request Nov 8, 2024

Documentation for changes and migration guide #89

Open

jannden requested changes Nov 14, 2024

View reviewed changes

tests/e2e.rs Show resolved Hide resolved

tlater-famedly and others added 9 commits November 14, 2024 14:07

feat!: Stop relying on a local cache to track changes

de972bb

WIP: test: Split apart multi-source oriented test setup

bdc786f

WIP: test: Update Zitadel version for test env

affe6f5

test: Support single-source sync

955460c

doc: Split apart sample configurations

61e47eb

chore: Remove bincode dependency

870e2ec

refactor: Move uuid method to user struct impl

3315952

WIP: fix!: Rework handling of binary fields from LDAP

5504703

tlater-famedly commented Nov 14, 2024

View reviewed changes

tlater-famedly force-pushed the tlater/remove-cache branch 4 times, most recently from 3b8816b to 15dca27 Compare November 14, 2024 14:06

tlater-famedly commented Nov 14, 2024

View reviewed changes

jannden and others added 4 commits November 14, 2024 15:15

refactor: Use external ID encoding supporting lexicographical order

07acf0a

WIP: Correctly encode localparts

4c5018c

WIP: Make zitadel IDP work

372d17c

WIP: Clean up rebase debris

3ebe290

tlater-famedly force-pushed the tlater/remove-cache branch from 15dca27 to 3ebe290 Compare November 14, 2024 14:15

tlater-famedly commented Nov 14, 2024

View reviewed changes

WIP: Deal with new clippy lint?

65d5928

tlater-famedly force-pushed the tlater/remove-cache branch from 18842a2 to 65d5928 Compare November 14, 2024 14:28

jannden requested changes Nov 14, 2024

View reviewed changes

src/main.rs Outdated Show resolved Hide resolved

src/sources/csv.rs Outdated Show resolved Hide resolved

src/user.rs Outdated Show resolved Hide resolved

src/user.rs Show resolved Hide resolved

jannden added 2 commits November 18, 2024 10:29

fix: Hex encode uid in CSV import

77543e0

test: SSO Linking

7b1b501

jannden approved these changes Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Remove ldap caching and move to the Zitadel v2 API #79

WIP: Remove ldap caching and move to the Zitadel v2 API #79

tlater-famedly commented Oct 30, 2024 •

edited by jannden

Loading

emgrav commented Oct 30, 2024

tlater-famedly Oct 30, 2024

tlater-famedly Nov 5, 2024

tlater-famedly commented Oct 30, 2024

emgrav commented Oct 31, 2024

jannden commented Nov 3, 2024 •

edited

Loading

tlater-famedly commented Nov 4, 2024

jannden commented Nov 4, 2024

codecov bot commented Nov 4, 2024 •

edited

Loading

emgrav commented Nov 5, 2024

tlater-famedly Nov 14, 2024

tlater-famedly Nov 14, 2024

tlater-famedly Nov 14, 2024

tlater-famedly Nov 14, 2024

tlater-famedly Nov 14, 2024

tlater-famedly Nov 14, 2024

jannden Nov 14, 2024

tlater-famedly Nov 14, 2024

jannden Nov 15, 2024

tlater-famedly Nov 14, 2024

jannden Nov 15, 2024

jannden left a comment

jannden left a comment

		@@ -1,17 +1,17 @@
		#![cfg(test)]
		#![allow(clippy::expect_fun_call)]

WIP: Remove ldap caching and move to the Zitadel v2 API #79

Are you sure you want to change the base?

WIP: Remove ldap caching and move to the Zitadel v2 API #79

Conversation

tlater-famedly commented Oct 30, 2024 • edited by jannden Loading

Multi-source config/test suite

External user ID encoding

Zitadel version

emgrav commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlater-famedly commented Oct 30, 2024

emgrav commented Oct 31, 2024

jannden commented Nov 3, 2024 • edited Loading

tlater-famedly commented Nov 4, 2024

jannden commented Nov 4, 2024

codecov bot commented Nov 4, 2024 • edited Loading

Codecov Report

emgrav commented Nov 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannden left a comment

Choose a reason for hiding this comment

jannden left a comment

Choose a reason for hiding this comment

tlater-famedly commented Oct 30, 2024 •

edited by jannden

Loading

jannden commented Nov 3, 2024 •

edited

Loading

codecov bot commented Nov 4, 2024 •

edited

Loading